40 research outputs found

    An integrative gene selection with association analysis for microarray data classification

    Get PDF
    The rising interest in integrative approach has shifted gene selection from purely data-centric to incorporating additional biological knowledge. Integrative gene selection is viewed as a promising approach in microarray data classification that took into consideration the complex relationships among genes. However, in most of the existing methods, the selection of genes is still based on expression values alone and biological knowledge is integrated at the end of analysis to verify experimental results or to gain biological insights. Thus, this paper proposed an integrative gene selection based on filter method and association analysis for selecting genes that are not only differentially expressed but also informative for classification. Association analysis is employed to integrate microarray data with multiple types of biological knowledge simultaneously, and to identify groups of genes that are frequently co-occurred in target samples. It has been tested on four cancer-related datasets, and two types of biological knowledge are incorporated, namely Gene Ontology (GO) and KEGG Pathways (KEGG). The experimental results show that the recommended GO based models, KEGG based models, and GO-KEGG based models outperformed the expression-only models by attaining better classification accuracies with lesser number of genes. The performance of the integrative models verified the efficiency and scalability of association analysis in mining microarray data

    A Sign Language to Text Converter Using Leap Motion

    Get PDF
    This paper presents a prototype that can convert sign language into text. A Leap Motion controller was utilised as an interface for hand motion tracking without the need of wearing any external instruments. Three recognition techniques were employed to measure the performance of the prototype, namely the Geometric Template Matching, Artificial Neural Network and Cross Correlation. 26 alphabets from American Sign Language were chosen for training and testing the proposed prototype. The experimental results showed that Geometric Template Matching achieved the highest recognition accuracy compared to the other recognition techniques

    An industrial IoT-based blockchain-enabled secure searchable encryption approach for healthcare systems using neural network

    Get PDF
    The IoT refers to the interconnection of things to the physical network that is embedded with software, sensors, and other devices to exchange information from one device to the other. The interconnection of devices means there is the possibility of challenges such as security, trustworthiness, reliability, confidentiality, and so on. To address these issues, we have proposed a novel group theory (GT)-based binary spring search (BSS) algorithm which consists of a hybrid deep neural network approach. The proposed approach effectively detects the intrusion within the IoT network. Initially, the privacy-preserving technology was implemented using a blockchain-based methodology. Security of patient health records (PHR) is the most critical aspect of cryptography over the Internet due to its value and importance, preferably in the Internet of Medical Things (IoMT). Search keywords access mechanism is one of the typical approaches used to access PHR from a database, but it is susceptible to various security vulnerabilities. Although blockchain-enabled healthcare systems provide security, it may lead to some loopholes in the existing state of the art. In literature, blockchainenabled frameworks have been presented to resolve those issues. However, these methods have primarily focused on data storage and blockchain is used as a database. In this paper, blockchain as a distributed database is proposed with a homomorphic encryption technique to ensure a secure search and keywords-based access to the database. Additionally, the proposed approach provides a secure key revocation mechanism and updates various policies accordingly. As a result, a secure patient healthcare data access scheme is devised, which integrates blockchain and trust chain to fulfill the efficiency and security issues in the current schemes for sharing both types of digital healthcare data. Hence, our proposed approach provides more security, efficiency, and transparency with cost-effectiveness. We performed our simulations based on the blockchain-based tool Hyperledger Fabric and OrigionLab for analysis and evaluation. We compared our proposed results with the benchmark models, respectively. Our comparative analysis justifies that our proposed framework provides better security and searchable mechanism for the healthcare system

    Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: In an era of shifting global agendas and expanded emphasis on non-communicable diseases and injuries along with communicable diseases, sound evidence on trends by cause at the national level is essential. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) provides a systematic scientific assessment of published, publicly available, and contributed data on incidence, prevalence, and mortality for a mutually exclusive and collectively exhaustive list of diseases and injuries. Methods: GBD estimates incidence, prevalence, mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) due to 369 diseases and injuries, for two sexes, and for 204 countries and territories. Input data were extracted from censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Cause-specific death rates and cause fractions were calculated using the Cause of Death Ensemble model and spatiotemporal Gaussian process regression. Cause-specific deaths were adjusted to match the total all-cause deaths calculated as part of the GBD population, fertility, and mortality estimates. Deaths were multiplied by standard life expectancy at each age to calculate YLLs. A Bayesian meta-regression modelling tool, DisMod-MR 2.1, was used to ensure consistency between incidence, prevalence, remission, excess mortality, and cause-specific mortality for most causes. Prevalence estimates were multiplied by disability weights for mutually exclusive sequelae of diseases and injuries to calculate YLDs. We considered results in the context of the Socio-demographic Index (SDI), a composite indicator of income per capita, years of schooling, and fertility rate in females younger than 25 years. Uncertainty intervals (UIs) were generated for every metric using the 25th and 975th ordered 1000 draw values of the posterior distribution. Findings: Global health has steadily improved over the past 30 years as measured by age-standardised DALY rates. After taking into account population growth and ageing, the absolute number of DALYs has remained stable. Since 2010, the pace of decline in global age-standardised DALY rates has accelerated in age groups younger than 50 years compared with the 1990–2010 time period, with the greatest annualised rate of decline occurring in the 0–9-year age group. Six infectious diseases were among the top ten causes of DALYs in children younger than 10 years in 2019: lower respiratory infections (ranked second), diarrhoeal diseases (third), malaria (fifth), meningitis (sixth), whooping cough (ninth), and sexually transmitted infections (which, in this age group, is fully accounted for by congenital syphilis; ranked tenth). In adolescents aged 10–24 years, three injury causes were among the top causes of DALYs: road injuries (ranked first), self-harm (third), and interpersonal violence (fifth). Five of the causes that were in the top ten for ages 10–24 years were also in the top ten in the 25–49-year age group: road injuries (ranked first), HIV/AIDS (second), low back pain (fourth), headache disorders (fifth), and depressive disorders (sixth). In 2019, ischaemic heart disease and stroke were the top-ranked causes of DALYs in both the 50–74-year and 75-years-and-older age groups. Since 1990, there has been a marked shift towards a greater proportion of burden due to YLDs from non-communicable diseases and injuries. In 2019, there were 11 countries where non-communicable disease and injury YLDs constituted more than half of all disease burden. Decreases in age-standardised DALY rates have accelerated over the past decade in countries at the lower end of the SDI range, while improvements have started to stagnate or even reverse in countries with higher SDI. Interpretation: As disability becomes an increasingly large component of disease burden and a larger component of health expenditure, greater research and developm nt investment is needed to identify new, more effective intervention strategies. With a rapidly ageing global population, the demands on health services to deal with disabling outcomes, which increase with age, will require policy makers to anticipate these changes. The mix of universal and more geographically specific influences on health reinforces the need for regular reporting on population health in detail and by underlying cause to help decision makers to identify success stories of disease control to emulate, as well as opportunities to improve. Funding: Bill & Melinda Gates Foundation. © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 licens

    Measuring universal health coverage based on an index of effective coverage of health services in 204 countries and territories, 1990–2019 : A systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background Achieving universal health coverage (UHC) involves all people receiving the health services they need, of high quality, without experiencing financial hardship. Making progress towards UHC is a policy priority for both countries and global institutions, as highlighted by the agenda of the UN Sustainable Development Goals (SDGs) and WHO's Thirteenth General Programme of Work (GPW13). Measuring effective coverage at the health-system level is important for understanding whether health services are aligned with countries' health profiles and are of sufficient quality to produce health gains for populations of all ages. Methods Based on the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019, we assessed UHC effective coverage for 204 countries and territories from 1990 to 2019. Drawing from a measurement framework developed through WHO's GPW13 consultation, we mapped 23 effective coverage indicators to a matrix representing health service types (eg, promotion, prevention, and treatment) and five population-age groups spanning from reproductive and newborn to older adults (≥65 years). Effective coverage indicators were based on intervention coverage or outcome-based measures such as mortality-to-incidence ratios to approximate access to quality care; outcome-based measures were transformed to values on a scale of 0–100 based on the 2·5th and 97·5th percentile of location-year values. We constructed the UHC effective coverage index by weighting each effective coverage indicator relative to its associated potential health gains, as measured by disability-adjusted life-years for each location-year and population-age group. For three tests of validity (content, known-groups, and convergent), UHC effective coverage index performance was generally better than that of other UHC service coverage indices from WHO (ie, the current metric for SDG indicator 3.8.1 on UHC service coverage), the World Bank, and GBD 2017. We quantified frontiers of UHC effective coverage performance on the basis of pooled health spending per capita, representing UHC effective coverage index levels achieved in 2019 relative to country-level government health spending, prepaid private expenditures, and development assistance for health. To assess current trajectories towards the GPW13 UHC billion target—1 billion more people benefiting from UHC by 2023—we estimated additional population equivalents with UHC effective coverage from 2018 to 2023. Findings Globally, performance on the UHC effective coverage index improved from 45·8 (95% uncertainty interval 44·2–47·5) in 1990 to 60·3 (58·7–61·9) in 2019, yet country-level UHC effective coverage in 2019 still spanned from 95 or higher in Japan and Iceland to lower than 25 in Somalia and the Central African Republic. Since 2010, sub-Saharan Africa showed accelerated gains on the UHC effective coverage index (at an average increase of 2·6% [1·9–3·3] per year up to 2019); by contrast, most other GBD super-regions had slowed rates of progress in 2010–2019 relative to 1990–2010. Many countries showed lagging performance on effective coverage indicators for non-communicable diseases relative to those for communicable diseases and maternal and child health, despite non-communicable diseases accounting for a greater proportion of potential health gains in 2019, suggesting that many health systems are not keeping pace with the rising non-communicable disease burden and associated population health needs. In 2019, the UHC effective coverage index was associated with pooled health spending per capita (r=0·79), although countries across the development spectrum had much lower UHC effective coverage than is potentially achievable relative to their health spending. Under maximum efficiency of translating health spending into UHC effective coverage performance, countries would need to reach 1398pooledhealthspendingpercapita(US1398 pooled health spending per capita (US adjusted for purchasing power parity) in order to achieve 80 on the UHC effective coverage index. From 2018 to 2023, an estimated 388·9 million (358·6–421·3) more population equivalents would have UHC effective coverage, falling well short of the GPW13 target of 1 billion more people benefiting from UHC during this time. Current projections point to an estimated 3·1 billion (3·0–3·2) population equivalents still lacking UHC effective coverage in 2023, with nearly a third (968·1 million [903·5–1040·3]) residing in south Asia. Interpretation The present study demonstrates the utility of measuring effective coverage and its role in supporting improved health outcomes for all people—the ultimate goal of UHC and its achievement. Global ambitions to accelerate progress on UHC service coverage are increasingly unlikely unless concerted action on non-communicable diseases occurs and countries can better translate health spending into improved performance. Focusing on effective coverage and accounting for the world's evolving health needs lays the groundwork for better understanding how close—or how far—all populations are in benefiting from UHC

    The global burden of cancer attributable to risk factors, 2010-19 : a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background Understanding the magnitude of cancer burden attributable to potentially modifiable risk factors is crucial for development of effective prevention and mitigation strategies. We analysed results from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 to inform cancer control planning efforts globally. Methods The GBD 2019 comparative risk assessment framework was used to estimate cancer burden attributable to behavioural, environmental and occupational, and metabolic risk factors. A total of 82 risk-outcome pairs were included on the basis of the World Cancer Research Fund criteria. Estimated cancer deaths and disability-adjusted life-years (DALYs) in 2019 and change in these measures between 2010 and 2019 are presented. Findings Globally, in 2019, the risk factors included in this analysis accounted for 4.45 million (95% uncertainty interval 4.01-4.94) deaths and 105 million (95.0-116) DALYs for both sexes combined, representing 44.4% (41.3-48.4) of all cancer deaths and 42.0% (39.1-45.6) of all DALYs. There were 2.88 million (2.60-3.18) risk-attributable cancer deaths in males (50.6% [47.8-54.1] of all male cancer deaths) and 1.58 million (1.36-1.84) risk-attributable cancer deaths in females (36.3% [32.5-41.3] of all female cancer deaths). The leading risk factors at the most detailed level globally for risk-attributable cancer deaths and DALYs in 2019 for both sexes combined were smoking, followed by alcohol use and high BMI. Risk-attributable cancer burden varied by world region and Socio-demographic Index (SDI), with smoking, unsafe sex, and alcohol use being the three leading risk factors for risk-attributable cancer DALYs in low SDI locations in 2019, whereas DALYs in high SDI locations mirrored the top three global risk factor rankings. From 2010 to 2019, global risk-attributable cancer deaths increased by 20.4% (12.6-28.4) and DALYs by 16.8% (8.8-25.0), with the greatest percentage increase in metabolic risks (34.7% [27.9-42.8] and 33.3% [25.8-42.0]). Interpretation The leading risk factors contributing to global cancer burden in 2019 were behavioural, whereas metabolic risk factors saw the largest increases between 2010 and 2019. Reducing exposure to these modifiable risk factors would decrease cancer mortality and DALY rates worldwide, and policies should be tailored appropriately to local cancer risk factor burden. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.Peer reviewe

    Measuring universal health coverage based on an index of effective coverage of health services in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background Achieving universal health coverage (UHC) involves all people receiving the health services they need, of high quality, without experiencing financial hardship. Making progress towards UHC is a policy priority for both countries and global institutions, as highlighted by the agenda of the UN Sustainable Development Goals (SDGs) and WHO's Thirteenth General Programme of Work (GPW13). Measuring effective coverage at the health-system level is important for understanding whether health services are aligned with countries' health profiles and are of sufficient quality to produce health gains for populations of all ages. Methods Based on the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019, we assessed UHC effective coverage for 204 countries and territories from 1990 to 2019. Drawing from a measurement framework developed through WHO's GPW13 consultation, we mapped 23 effective coverage indicators to a matrix representing health service types (eg, promotion, prevention, and treatment) and five population-age groups spanning from reproductive and newborn to older adults (≥65 years). Effective coverage indicators were based on intervention coverage or outcome-based measures such as mortality-to-incidence ratios to approximate access to quality care; outcome-based measures were transformed to values on a scale of 0–100 based on the 2·5th and 97·5th percentile of location-year values. We constructed the UHC effective coverage index by weighting each effective coverage indicator relative to its associated potential health gains, as measured by disability-adjusted life-years for each location-year and population-age group. For three tests of validity (content, known-groups, and convergent), UHC effective coverage index performance was generally better than that of other UHC service coverage indices from WHO (ie, the current metric for SDG indicator 3.8.1 on UHC service coverage), the World Bank, and GBD 2017. We quantified frontiers of UHC effective coverage performance on the basis of pooled health spending per capita, representing UHC effective coverage index levels achieved in 2019 relative to country-level government health spending, prepaid private expenditures, and development assistance for health. To assess current trajectories towards the GPW13 UHC billion target—1 billion more people benefiting from UHC by 2023—we estimated additional population equivalents with UHC effective coverage from 2018 to 2023. Findings Globally, performance on the UHC effective coverage index improved from 45·8 (95% uncertainty interval 44·2–47·5) in 1990 to 60·3 (58·7–61·9) in 2019, yet country-level UHC effective coverage in 2019 still spanned from 95 or higher in Japan and Iceland to lower than 25 in Somalia and the Central African Republic. Since 2010, sub-Saharan Africa showed accelerated gains on the UHC effective coverage index (at an average increase of 2·6% [1·9–3·3] per year up to 2019); by contrast, most other GBD super-regions had slowed rates of progress in 2010–2019 relative to 1990–2010. Many countries showed lagging performance on effective coverage indicators for non-communicable diseases relative to those for communicable diseases and maternal and child health, despite non-communicable diseases accounting for a greater proportion of potential health gains in 2019, suggesting that many health systems are not keeping pace with the rising non-communicable disease burden and associated population health needs. In 2019, the UHC effective coverage index was associated with pooled health spending per capita (r=0·79), although countries across the development spectrum had much lower UHC effective coverage than is potentially achievable relative to their health spending. Under maximum efficiency of translating health spending into UHC effective coverage performance, countries would need to reach 1398pooledhealthspendingpercapita(US1398 pooled health spending per capita (US adjusted for purchasing power parity) in order to achieve 80 on the UHC effective coverage index. From 2018 to 2023, an estimated 388·9 million (358·6–421·3) more population equivalents would have UHC effective coverage, falling well short of the GPW13 target of 1 billion more people benefiting from UHC during this time. Current projections point to an estimated 3·1 billion (3·0–3·2) population equivalents still lacking UHC effective coverage in 2023, with nearly a third (968·1 million [903·5–1040·3]) residing in south Asia. Interpretation The present study demonstrates the utility of measuring effective coverage and its role in supporting improved health outcomes for all people—the ultimate goal of UHC and its achievement. Global ambitions to accelerate progress on UHC service coverage are increasingly unlikely unless concerted action on non-communicable diseases occurs and countries can better translate health spending into improved performance. Focusing on effective coverage and accounting for the world's evolving health needs lays the groundwork for better understanding how close—or how far—all populations are in benefiting from UHC

    Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life Years for 29 Cancer Groups From 2010 to 2019: A Systematic Analysis for the Global Burden of Disease Study 2019.

    Get PDF
    The Global Burden of Diseases, Injuries, and Risk Factors Study 2019 (GBD 2019) provided systematic estimates of incidence, morbidity, and mortality to inform local and international efforts toward reducing cancer burden. To estimate cancer burden and trends globally for 204 countries and territories and by Sociodemographic Index (SDI) quintiles from 2010 to 2019. The GBD 2019 estimation methods were used to describe cancer incidence, mortality, years lived with disability, years of life lost, and disability-adjusted life years (DALYs) in 2019 and over the past decade. Estimates are also provided by quintiles of the SDI, a composite measure of educational attainment, income per capita, and total fertility rate for those younger than 25 years. Estimates include 95% uncertainty intervals (UIs). In 2019, there were an estimated 23.6 million (95% UI, 22.2-24.9 million) new cancer cases (17.2 million when excluding nonmelanoma skin cancer) and 10.0 million (95% UI, 9.36-10.6 million) cancer deaths globally, with an estimated 250 million (235-264 million) DALYs due to cancer. Since 2010, these represented a 26.3% (95% UI, 20.3%-32.3%) increase in new cases, a 20.9% (95% UI, 14.2%-27.6%) increase in deaths, and a 16.0% (95% UI, 9.3%-22.8%) increase in DALYs. Among 22 groups of diseases and injuries in the GBD 2019 study, cancer was second only to cardiovascular diseases for the number of deaths, years of life lost, and DALYs globally in 2019. Cancer burden differed across SDI quintiles. The proportion of years lived with disability that contributed to DALYs increased with SDI, ranging from 1.4% (1.1%-1.8%) in the low SDI quintile to 5.7% (4.2%-7.1%) in the high SDI quintile. While the high SDI quintile had the highest number of new cases in 2019, the middle SDI quintile had the highest number of cancer deaths and DALYs. From 2010 to 2019, the largest percentage increase in the numbers of cases and deaths occurred in the low and low-middle SDI quintiles. The results of this systematic analysis suggest that the global burden of cancer is substantial and growing, with burden differing by SDI. These results provide comprehensive and comparable estimates that can potentially inform efforts toward equitable cancer control around the world.Funding/Support: The Institute for Health Metrics and Evaluation received funding from the Bill & Melinda Gates Foundation and the American Lebanese Syrian Associated Charities. Dr Aljunid acknowledges the Department of Health Policy and Management of Kuwait University and the International Centre for Casemix and Clinical Coding, National University of Malaysia for the approval and support to participate in this research project. Dr Bhaskar acknowledges institutional support from the NSW Ministry of Health and NSW Health Pathology. Dr Bärnighausen was supported by the Alexander von Humboldt Foundation through the Alexander von Humboldt Professor award, which is funded by the German Federal Ministry of Education and Research. Dr Braithwaite acknowledges funding from the National Institutes of Health/ National Cancer Institute. Dr Conde acknowledges financial support from the European Research Council ERC Starting Grant agreement No 848325. Dr Costa acknowledges her grant (SFRH/BHD/110001/2015), received by Portuguese national funds through Fundação para a Ciência e Tecnologia, IP under the Norma Transitória grant DL57/2016/CP1334/CT0006. Dr Ghith acknowledges support from a grant from Novo Nordisk Foundation (NNF16OC0021856). Dr Glasbey is supported by a National Institute of Health Research Doctoral Research Fellowship. Dr Vivek Kumar Gupta acknowledges funding support from National Health and Medical Research Council Australia. Dr Haque thanks Jazan University, Saudi Arabia for providing access to the Saudi Digital Library for this research study. Drs Herteliu, Pana, and Ausloos are partially supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNDS-UEFISCDI, project number PN-III-P4-ID-PCCF-2016-0084. Dr Hugo received support from the Higher Education Improvement Coordination of the Brazilian Ministry of Education for a sabbatical period at the Institute for Health Metrics and Evaluation, between September 2019 and August 2020. Dr Sheikh Mohammed Shariful Islam acknowledges funding by a National Heart Foundation of Australia Fellowship and National Health and Medical Research Council Emerging Leadership Fellowship. Dr Jakovljevic acknowledges support through grant OI 175014 of the Ministry of Education Science and Technological Development of the Republic of Serbia. Dr Katikireddi acknowledges funding from a NHS Research Scotland Senior Clinical Fellowship (SCAF/15/02), the Medical Research Council (MC_UU_00022/2), and the Scottish Government Chief Scientist Office (SPHSU17). Dr Md Nuruzzaman Khan acknowledges the support of Jatiya Kabi Kazi Nazrul Islam University, Bangladesh. Dr Yun Jin Kim was supported by the Research Management Centre, Xiamen University Malaysia (XMUMRF/2020-C6/ITCM/0004). Dr Koulmane Laxminarayana acknowledges institutional support from Manipal Academy of Higher Education. Dr Landires is a member of the Sistema Nacional de Investigación, which is supported by Panama’s Secretaría Nacional de Ciencia, Tecnología e Innovación. Dr Loureiro was supported by national funds through Fundação para a Ciência e Tecnologia under the Scientific Employment Stimulus–Institutional Call (CEECINST/00049/2018). Dr Molokhia is supported by the National Institute for Health Research Biomedical Research Center at Guy’s and St Thomas’ National Health Service Foundation Trust and King’s College London. Dr Moosavi appreciates NIGEB's support. Dr Pati acknowledges support from the SIAN Institute, Association for Biodiversity Conservation & Research. Dr Rakovac acknowledges a grant from the government of the Russian Federation in the context of World Health Organization Noncommunicable Diseases Office. Dr Samy was supported by a fellowship from the Egyptian Fulbright Mission Program. Dr Sheikh acknowledges support from Health Data Research UK. Drs Adithi Shetty and Unnikrishnan acknowledge support given by Kasturba Medical College, Mangalore, Manipal Academy of Higher Education. Dr Pavanchand H. Shetty acknowledges Manipal Academy of Higher Education for their research support. Dr Diego Augusto Santos Silva was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil Finance Code 001 and is supported in part by CNPq (302028/2018-8). Dr Zhu acknowledges the Cancer Prevention and Research Institute of Texas grant RP210042

    Integrated framework with association analysis for gene selection in microarray data classification

    Get PDF
    Microarray data classification is one of the major interests in health informatics that aims at discovering hidden patterns in gene expression profiles. The main challenge in building this classification system is the curse of dimensionality problem. Therefore, gene selection is an indispensable task in microarray data classification to identify smaller sets of relevant genes. However, most of the existing gene selection methods are statistical analyses and purely based on gene expression values in the identification of differentially expressed genes. As a result, the selected genes might be false positives and are not biologically meaningful. The purpose of this study was to integrate microarray dataset with additional biological information for selecting genes that are not only differentially expressed but also informative for classifiers. To achieve that, an integrated framework with a new gene selection method was developed to improve classification performance in terms of accuracy and number of selected genes. The proposed gene selection method combined the strength of both filter method and association analysis to identify a set of discriminative and informative genes. Association analysis was employed to integrate more than one type of biological information in the same transaction database, and also to identify groups of genes that are frequently co-occurred in target samples. Modifications have been made on the existing association algorithm for mining frequent itemsets, where genes in each itemset were sorted according to their discriminative scores rather than according to lexicographic order. In addition to that, discriminative scores were used to compute interestingness of frequent itemsets before ranking them. The proposed integrated framework has been tested on colon cancer, leukemia, breast cancer and lung cancer microarray datasets. Two types of biological information were incorporated in the selection process, namely the Gene Ontology (GO) and the KEGG Pathways (KEGG). The experimental results showed that the recommended GO based models, KEGG based models, and GO-KEGG based models outperformed the expression-only models by attaining better classification accuracies with less number of genes. In the experiments, leukemia and lung cancer datasets had achieved 100% accuracies in all the classifiers with number of selected genes as small as three. On the other hand, colon cancer and breast cancer datasets achieved better classification accuracies compared with the previous integrated method, which are 95.16% and 95.88% respectively. Moreover, the proposed integrated framework proved to build informative and interpretable microarray classification models. The selected genes can be traced back to their functional annotations and association groups for reasoning and creating new hypotheses for future investigation
    corecore